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Alternatively, these embodiments can be 
implemented for showing an optimal data retrieval 
performance when the query leads to a relatively large 
number of hits according to the claims 7 to 10 f; 19, and 
5 30 to 33. By using bitmaps to represent result sets, 
memory consumption is reduced because a result flag 
instance consumes considerably more memory than a bit. 

Another goal is to provide method, data storage system 
and computer program product that automatically select 
an optimal data retrieval implementation for a specific 

number of hits . 

This goal is met by embodiments of the invention 

according to the claims 11. 20, 22, and 35. 

Further, these embodiments can be implemented to 
dynamically tune threshold values to enable a selection 
of the optimal data retrieval implementation in a 
changing environment according to the claims 12 to 15, 
20, 23 to 25, and 36 to 38. 
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Another goal is to provide method, data storage system 
and computer program product to reduce memory 
consumption for data retrieval when the Boolean 
expression has a simpler structure. 

This goal is met by embodiments of the invention 
according to the claims 16, 21, and 39. 

Further, these embodiments can be used to 
implement a "Lean AND" mechanism according to the 
claims 17, 18, 21, 40, and 41 

The aspects of the invention will be realized and 
attained by means of the elements and combinations 
particularly pointed out in the appended claims. Also, 
the described combination of the features of the 
invention is not be understood as a limitation, and all 
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METHOD AHD COMPUTER SYSTEM FOR QUERY PROCESSING 

Field of the invention 

The present invention generally relates to electronic 
5 data processing, and more particularly, relates to 
methods, computer program products and systems for data 
retrieval . 

Background of the invention 

The patent application PCT/EP02/01026 discloses a data 
10 structure that can be used in a data storage system to 
support a fast query process. However, the referenced 
application discloses the fast query process with 
respect to. a single condition* Normally, Boolean 
expressions include multiple complex select statements 
15 that have to be evaluated. 

Summary of the Invention 

A goal of the invention is to provide a method, data 
storage system and computer program product to retrieve 
data from the data storage system in response to a. 

2 0 query that includes a Boolean expression having 

multiple conditions or even nested select statements. 

This goal is met by embodiments of the invention 
according to the claims 1, 19, 26, and 34. 

Further, these embodiments can be implemented for 
25 showing an optimal data retrieval performance when the 
query leads to a relatively small number of hits 
according to the claims 5 f 6, 19, 28, and 29. By 
instantiating result flag instances from a result flag 
class corresponding result sets are created. Boolean 

3 0 operators can be applied to result sets without a need 

to access corresponding data records. 
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PIG. 11 illustrates how two result sets can be merged 
into a single result set when applying the 
Boolean AND operator ; 
FIG . 12 illustrates a first implementation for the 
5 result flags and the result sets; 

PlGs. 13 illustrates a second implementation for the 

result flags and the result sets; 
FIG- 14. illustrates how result flags relate to 
corresponding IC- anchors in the second 
10 implementation; 

FIG. 15 illustrates, how Boolean operators can be 
applied to the result sets in the second 
implementation ,- 
FIG. 16 is a simplified block diagram of software 

components of the computer system to dynamically 
select a data retriever implementation; 
FIG. 17 illustrates initialising threshold values; and 
FIG*. 18 illustrates dynamically adjusting threshold 
values. 
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Detailed Description of the Invention 

in the following, an example of a data (storage) system 
is described that can be used together with an 
embodiment of the invention. The data system may be 

25 implemented according to a relational database model. 
However, the system is not limited to use within the 
constraints of a known relational database 
architecture . The elements of the data system roughly - 
translate to the known nomenclature of the relational 

30 database theory as follows (with the definitions used 
with an embodiment of the invention on the left) : 

InfoSystera <r Management System 

InfoArea <r Database 
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the features can be combined in other constellations 
without departing from the spirit of the invention. It 
is to be understood that both the foregoing general 
description and the following detailed description are 
5 exemplary and explanatory only and are not restrictive 
of the invention as described. 

Brief Description of the Drawings 

10 FIG. 1 is a simplified block diagram of a computer 

system that can.be used with an embodiment of 

the invention; 
FIG. 2 is a diagram of a static hierarchical data 

structure used in one embodiment of a data 
15 storage system according to the invention; 

FIG, 3 schematically shows the initial state of an 

anchor as used in the data structure ; 
FIG. 4 illustrates the use of the anchor for the 

implementation of an InfoType; 

2 0 FIG. 5 illustrates adding an InfoCell to the data 

structure; 

FIG. 6 illustrates the structure that is obtained when 
multiple InfoTypes are put into the data 
structure; 

25 FIG. 7 shows an InfoCourse that contains data; 

FIG. 8 illustrates multiple InfoCourse paths in the 

data structure; 
FIG. 9 illustrates how to retrieve data from the data 
storage system when operated according to the 

3 0 invention; 

FIG. 10 illustrates how two result sets can be merged 
into a single result set when applying the 
Boolean OR operator; 
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FIG. 1 is a simplified block diagram of a computer 
system 990 that can be used with an embodiment of the 
invention. The computer system- 990 includes multiple 
computing devices {e.g., first computing device 901 and 
5 second computing device 902) that communicate over a 
network 999, such as a local area network (IAN), wide 
area network (WAN), the Internet, or a wireless 
network - 

For example, the second computing device 902 may 
10 be a backend system, such as a database system, a file 
system or an application system, that stores data. The 

^" "data "can' also be stored anywhere inside or outside of 

the computer system 990. 

The first computing device 901 may be used to 
15 compose Boolean expressions 50 0 to be used in a QUERY 
for retrieving selected data from the second computing 
device 902. For example, the first computing device 901 > 
may be a front end computer that provides a graphical 
user interface (GUI) to a user. 
20 There can be various ways in which the data 

storage system 902 receives the QUERY, dependent on the y 
interfaces offered for the data storage system 902 . For 
example, in case of using an SAP R/3 based system, the 
SAP Remote Function Call (RFC) functionality provided 
25 by the ABAP kernel can be used. An application 

programming interface (API) can be implemented as a 
collection of ABAP Function Modules . The API uses the 
RFC functionality to communicate remotely with the data 
storage system. An SAP R/3 based application uses the 
3 0 API for receiving parameters that are passed to the 

data storage system 902. The corresponding results are 
then returned as ABAP parameters. A selection query is 
filled into an internal table in ABAP and can be 
rapidly processed by the data storage system since the 
3 5 query is already pre- structured. 
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InfoCluster <r Table 
Inf oType Attribute 
Inf ©Course <- Data record 

InfoCell Field 

Further definitions of terms, as used hereinafter: 
Boolean operators : 

operators used in Boolean statements, e.g., AND, OR. 



Relational operators: 

operators used in relational statements, e.g., 

< (less than) 
<= (less than or equal to) 
15 > (greater than) 

>= (greater than or equal to) 

(equal to) 
<> (not equal to) 

or any other operator that can be used to express 
20 a relation between two values. 

Condition: 

relational statement comparing data, such as numerical 
data or alphanumeric data, using one or more relational 
25 operators . 

Boolean expression: 

statement including multiple conditions that are 
combined using Boolean operators ♦ 

30 
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InfoAreas through a linking element, which will be 
described hereinafter as an anchor. These InfoAreas can 
for example refer to logical units of the infoSystem. 

Each infoArea is connected via a linking element 
(again an anchor as described hereinafter) to an 
InfoCluster. In turn, each InfoCluster is connected to 
at least one InfoCourse and at least one InfoType, 
through respective linking elements, such as anchors. 
The InfoType can be seen as an attribute of a table; an 
InfoCourse starts always in an InfoCluster. If an 
InfoCourse stays within an InfoCluster with its 
addressed InfoCell elements corresponding to fields of 
a table, then the InfoCourse is similar to a record of 
a table, such as a relational database table. 

Under the InfoCourse and the InfoType the InfoCell 
is found; this is the element on the lowest level in 
the hierarchical structure. On the creation of an 
InfoType an anchor is created that is an InfoCell also. 
This anchor has the function to represent the structure 
of following InfoCell elements (cf. fig. 3). 

For the implementation of the levels below the 
infoArea level, i.e. the InfoCluster, the InfoCourse, 
the InfoType, and the InfoCell levels, use is made of a 
data element according to the invention as shown in 
25 FIG. 3. In thiB example, the data element is shown 
schematically as an anchor, and is provided with a 
number of pointers. The pointers of the first pair are 
labelled LVR and RVR (Left Vertical Ring, respectively 
Right vertical Ring) , the pointers. of the second pair. .. 
3 0 are labelled LHR and RHR (Left Horizontal Ring, 

respectively Right Horizontal Ring) , the pointers of 
the third pair are labelled L»SR and RSR (Left Self 
Ring, respectively Right Self Ring) , and the single 
pointer is labelled IF (InFormation bridge) .Note that 
35 the pointers LSR, RSR and IF are in principle optional. 
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In general, any interface or meta format can be 
used to post a Query to the data storage system. A pre- 
structured query is useful but not necessary. The query 
may also be coded in XML or simply be passed to the 
\ 5 data storage system as a string that has to be parsed 

within the data storage system , 

FIGs, 2 to 8 explain details of one embodiment of 
the data storage system 902. For example, as described 
in the patent application PCT/EP02/01026, the data 

10 storage system 902 can be configured as a fast cache 
with all data structures residing in its main memory. 
The Boolean expression 500 can include at least a first 
portion 501 and a second portion 502, each portion 
representing a selection condition of any degree of 

15 complexity applicable to the data structures in the 
main memory. Further portions may be included. Th^ 
portions are combined through logical or relational 
operators (OP) . 

20 FIG. 2 is a diagram of a static hierarchy structure 

used in one embodiment of the data storage system 902 , 
Each box in the structure corresponds to an instance of 
the data type that is used as a label for the box. 
Multiple overlapping boxes illustrate multiple 

25 instances of the same data type. A single arrow between 
instances of different data types stands for an 
arbitrary number of arrows between multiple instances 
at each corresponding level of the structure. In the 
following, the data type labels are used to refer to 

30 corresponding instances of the data type- The highest 
level in the structure is the InfoSystem level. Down, 
from the top level one or more InfoAreas are connected 
to the InfoSystem. The InfoSystem provides algorithms 
necessary to operate the data storage system in run 

35 time. The InfoSystem is connected to any number of 
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The order in which the InfoCells are organized 
depends on their value. In case of a smaller value, the 
InfoCell is ordered in on the LVR side, otherwise on 
the RVR side- This practise is well known in the art as 
5 binary tree building. Preferably, the binary trees are 
organized as balanced or AVL trees, methods which are 
well known in the art. These kinds of trees minimise 
the number of levels within the tree structure, so as 
to minimize access time. Preferably, all tree 
10 structures within the data system are dynamically 
balanced in use, so as to guarantee optimum access 
- -• times V— •- — - ~< ----- 

FIG. 6 illustrates the structure that is obtained when 
15 all InfoTypes of the table A are put into the data 

structure. In total, three InfoTypes are present; age, 
first name, and weight. Note that the end pointers of x 
each last element in the respective trees are not 
shown. Under each anchor of the infoType, the InfoCells 
20 are organised in a binary tree. The Inf ©Cluster points 

to an anchor which in turn points to a first InfoType. ? 
The first InfoType in turn points to the other two 
InfoTypes. Each InfoType points to an anchor. The 
anchor- has the additional function of a marker, that 
25 can be used by an access or query process as a break or 
return sign. 

To complete the implementation of the table, the 
relations between the InfoType have to be made. To this 
end an Inf oCourse is introduced.. 
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FIG. 7 shows the InfoCourse that contains the data for 
a row of the table A. Use is made of the LHR and RHR 
pointers. The end pointers again point back to the 
anchor of the InfoCourse to maintain the ring 
35 structure. Note that the InfoCourse also forms a binary 
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In the initial state, as shown in FIG. 3, all pointers 
point to the anchor. This initial state is also the 
simplest of possible ring structures. Every pointer in 
the structure has a valid address, and cases of a non 
5 defined pointer (nil pointer) are avoided. 

In the following, example data is used as shown in 
table A of FIG. 8. The table includes data regarding 
first names, ages and weights. For this table an 
10 InfoCluster is generated. Furthermore, three InfoTypes 
are generated to represent respectively first names, 
ages, and weights. 

FIG. 4 illustrates the use of the data element for the 
15 implementation of the InfoType. In the InfoType, 

semantic information is included, such as, the data 
type (in this example w INTEGER") , field name (in this 
example *age")), etc. The InfoType has an anchor 
associated with the InfoType. The anchor points with 
20 its RVR pointer to the actual information carrier, that 
is the InfoCell - The Inf oCell is as described above the 
lowest level entity within the data system. The 
InfoCell holds the information, as shown in FIG . 4; in 
this example "age is 3 0 in INTEGER". 
25 As described above, the InfoCell is provided with 

a LVR/RVR pointer pair. As shown in FIG. 4, the RVR 
pointer of the InfoCell points towards the anchor, and 
also the LVR pointer points to the anchor. As a result, 
the ring configuration of the anchor is maintained. 

FIG. 5 illustrates how a further InfoCell is added to 
the data structure. The InfoCell (with the value M 25") 
is inserted in the LVR ring after the first InfoCell . 
The LVR and RVR pointers of the InfoCell point to the 
35 anchor, as to maintain a closed ring. 
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portion 501 (cf. FIG. D and the. second portion 502. If 
further portions are included they are also subject to 
decomposition. Each portion includes at least one 
condition that has to be fulfilled by any InfoCourse 
5 that is selected by the original query. The conditions 
relate to InfoTypes. 

The data storage system 902 then determines a 
result set for each portion. In the example, a first 
result set 361 includes result flags (C-FLAGl) in , 
10 compliance with the first portion 501 and a second 

result set 362 includes further result flags (C-FLAG2) 

in compliance" With the second portion 502. A result 

flag is used to indicate whether a specific InfoCourse 
fulfils a condition in a corresponding portion. Each 
15 result flag relates (bold up-arrows) to a result 

identification number 351, 352 of the corresponding 
result set 361, 362, where it belongs to. The result - 
flags within a result set are also interrelated (dashed 
arrows) . Further, each result flag relates (bold left- 
20 arrows) to the corresponding InfoCourse anchor (IC- 

anchor) 310, 311, .312, 313 of the InfoCourse fulfilling ,v 
the corresponding condition. 

The two result sets 361, 362 can originate from 
the evaluation of a complex Boolean expression, where 
25 the first result set 361 can be the result of one 
bracket including potentially any Boolean sub- 
expression as sub-query. The same is true for the 
second result set 362, e.g. representing another 
bracket . of . the Boolean expression. ... -■ 
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figs. 10 and 11 illustrate how the first and the second 
result sets 361, 362 can be merged into a single result 
set 363 when applying corresponding Boolean operators 
to the result flags of the corresponding InfoCourse 
35 anchors. One implementation example using pointer lists 



-- - Em-Manssz-.8i~t-. 8-Juli 10:52 



2003P00396 EP 11 



tree, sorted by the ID numbers of the InfoTypes. Note 
that the ID numbers of the InfoTypes are unique. For 
example, integer values are used for the ID numbers . 

5 PIG- 8 illustrates all the InfoCourse paths (for 

example implemented using pointers) for the table A. 
Note that all InfoCells have been provided in the top 
section with their respective InfoType Id number, over 
which the binary tree configuration of the InfoCourse 

10 via the LHR/RHR pointers is organized. Elements that 
belong to an InfoType are connected by solid arrows. 
Elements that belong to -an InfoCourse are connected by 
dashed arrows* 

When five million records with 100 attributes 

15 (e.g., 100 columns of a relational database table) are 
loaded into the data storage system 902, then five 
million InfoCourse trees (InfoCourses) exist, one for 
each record- Each InfoCourse includes 100 nodes. Each 
InfoCourse has a corresponding InfoCourse anchor 

2 0 pointing to the respective InfoCourse. In other words, 

when loading five million records into the data storage 
system 902 then also five million InfoCourse anchors 
exist . 

25 FIG, 9 illustrates how a computer implemented method 
can be used to retrieve data from the data storage 
system 902 when, operated according to the invention. It 
is assumed that the data storage system 902 stores the 
data using the data structure as described in FIGs. 2 

3 0 to 8. Note that in this data structure each InfoCourse 

300, 301, 302, 303 has an InfoCourse anchor 310, 311, 
312, 313 . 

Once the Boolean expression is received by the 
data storage system 902 (cf. FIG. 1), a parser 
35 decomposes the Boolean expression 500 into the first 
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Further, it is checked whether a corresponding 
C-FIAG1 result flag exists. If not, the data storage 
system proceeds with the next C-FLAG2 in the second 
result set 362. If a corresponding C-FLAG1 exists, then 
5 one of these two result flags, either in the first or 
in the second result set, is deleted to avoid . 
intersections . 

To find out, whether a corresponding C-FLAG1 
exists in the first result set 361, for example, the 

10 data storage system moves along a circular structure 
that is used to relate the result flags to their 

.corresponding . I G- anchor .. ~ 

After having processed all result flags of the 
first and second result sets, only C-FLAGl result flags 

15 remain. The combination with OR means to link the two 
result sets together to one result set. In this 
example, in the end, all result flags have the result 
ID 351 of the first result set 361. During the above 
described procedure the Counters for the number of 

20 result flags in each result set are continuously 
updated (e.g., decremented when result flags are 
deleted). Therefore, the number of result flags in the 
"final" result set is the sum of the counters of the 
first and second result sets just prior to lining them 

25 together. This count result can be reported to an 
application as the number of InfoCourses (records) 
matching the first portion 501 or the second portion 
502 of the Boolean expression 500. 

The "final" result set may represent a real final 

3 0 result set or an intermediate result when the Boolean 
expression 500 includes further portions. In this case, 
it is combined again with further result sets that 
correspond to the further portions. A complex query 
consisting of several nested sub- queries may be 

3 5 evaluated recursively by combining the result sets of 
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is explained in more detail in FIG. 12. Another 
implementation example using bitmaps is explained in 
more detail in FIGs. 13 to 15. 

5 In the example of FIG. 10, the Boolean expression 500 
combines the first and second result sets with a 
Boolean OR operator. 

For the combination, the InfoCourses or the IC- 
anchors are not needed anymore. The number of result 

10 flags in each result set is known by, for example, 

incrementing a corresponding counter when creating the 
result flags. " 

In one implementation, the data storage system 
runs through one of the result sets from the first to 

15 the last result flag. Advantageously, the result set 
including the lowest number of result flags is chosen 
because of a shorter processing time, which becomes 
more relevant in the case of Boolean AND combinations. 
The first result set 361 includes three result flags 

20 (C-FIiAGl) and the second result 362 set includes two 
result flags (C-FLAG2) . Therefore, the data: storage 
system starts with the second result set 3 62 and then 
processes the first result set 361. In one 
implementation, for each IC-anchor where a result flag 

25 C-FIAGl or C-FIAG2 relates to, a corresponding result 
flag R-FIAG is generated in the third result set 363 
with having result identification number 353. 

In another implementation, one can use also the 
first or second result set for storing the result of 

3 0 the Boolean OR operation. 

For example, when running through the second 
result set 362, each C-FLAG2 can be "renamed" into 
C-FLAGl. The result ID 352 of the second result set is 
set to the result ID 351 of the first result set. 
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At the end of this filtering procedure, the second 
result set 362 includes the "final" result set and, 
therefore, plays the role of the third result set 363. 
The first result set 361 is not needed any more and can 
5 be deleted. 

Again, with each deletion of a result flag, the 
corresponding counter is reduced accordingly. 
Therefore, the counter of the second result set always 
contains the current number of result flags C-FLAG2, 
10 which, at the end of the filtering procedure, 

corresponds to the number of hits for the query and may 
TSe "reported' "to* *~an "Xppll.catTcS''." * -m...^. 

As in the Boolean OR case, the "final" result set 
may represent a real final result set or an 
- 15 intermediate result when the Boolean expression 500 

includes further portions that are subject to further 
combinations. v> \ 

PIG, 12 illustrates a first implementation for the \ 

20 result flags C-FIAG1, C-FLAG2 and the result sets. % 
In this first implementation, the data storage * \ 
system instantiates an instance (C-FALGl, C-FLAG2) of a ^ 
result flag class for each result flag. Multiple result 
flags for one InfoCourse 3 00 (record) are connected in \ 

25 a ring structure 800 . The ring structure 800 relates 
330, 320 to the corresponding IC-anchor 310. 
Advantageously, a docket element (D-FIAG) is used. The 
docket element represents a counterpart of the IC- 
anchor 310 on the side of the result flags. One 

3 0 advantage is that the docket element is decoupled from 
the IC-anchor in the sense that it is derived from a 
different class than the IC-anchor. Therefore, it can 
provide different functions than the IC-anchor. These 
functions can be used by the other result flags because 

3 5 the docket element is instantiated from the same class 
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sub-queries with the result sets of other, sub-queries. 
This continues until all levels of the Boolean 
expression are resolved. At the end, a single result 
set (e.g., result set 363) is left. Its number of 
5 result flags corresponds to the number of hits for the 
whole query (Boolean expression 500) . 

In the example of FIG. 11, the Boolean expression 500 
combines the first and second result sets with a 

10 Boolean AND operator.' 

Again, the data storage system knows the number of 
result flags in each result set from the corresponding- 
result counters and starts with processing the result 
set with the lowest number of result flags. This is 

15 advantageous in the case of Boolean AND combinations 

because the total number of result flags can only be as 
large as the smallest result set. In the example of 
FIG. 11 the second result set 362 is the smaller one, ; 

20 For each IC-anchor, m where a result flag C-FIAG1 

and also a result flag C-FLAG2 relate to, a 
corresponding result flag R-FLAG is generated in the 
third result -set 363. 

In one implementation, one can use also the first 

25 or second result set for storing the result of the 
Boolean AND operation. 

For each result flag C-FLAG2 of the second result 
set 362, the data storage system checks whether a 
corresponding result flag C-FLAG1 exists in the first 

30 result set 361. If so, the result flag C-FLAG2 is the 
data storage system proceeds with the next result flag 
of the second result set . If no corresponding result 
flag C-FIiAGl is found in the first result set, then the 
result flag C-FLAG2 in the second result set 362 is 

35 deleted. 
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in the example, the first result set 361 is 
implemented by the first pointer list PL-1 that 
includes the result flag pointers C-FLAG1 and has the 
result ID 351. The second result set 362 is implemented 
5 by the second pointer list PL-2 that includes the 

result flag pointers C-FLAG2 and has the result ID 352. 
The docking elements formally are also linked in a 
pointer list PL-D having its own result ID 350. 

Several result sets may exist simultaneously . On 
10 the level of a docket element D-FLAG, the result flags 
can be linked in the circular structure 800 using 
" pointers'; " such as^ pointer pSmallld "and pointer 
pLargeld- The pointer names indicate that the result 
flags in the circular structure 800 can be sorted by 
15 result ID* The circular structure 8 00 can be run 

through in both directions , e.g. to find the result 
flag of a particular result set. Sorting the result 
flags in the circular structure 800 by result ID helps 
to decide in which direction the circular structure 
20 should be searched for a fast identification of a 
certain result id. 

FIGs. 10 and 11 describe an implementation for 
applying the Boolean OR and AND operators to two result 
sets. These operators may be combined with a Boolean 
25 NOT operator. In this case, the data storage system 

runs through the docket elements of all IO anchors and 
instantiates a result flag in a new result set each 
time when there is no result flag in the original 
result set where the NOT operator is applied to. At the . 
30 end of the procedure the original result set is not 
used any more and can be deleted. Note that the 
InfoCourses are not needed to perform the inversion. 
Only the IC- anchors are used. 

The number of hits as well as some or all of the 
3 5 InfoCourses that match the query may be returned to an 
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as the result flags. The decoupling allows instances 
from the result flag class to consume less memory than 
a corresponding IC-anchor that has, for example, more 
pointers, additional administrative information (e.g., 
5 number of elements in a substructure) , methods that 
operate on attributes, such as "sort elements" or 
"balance tree", etc. The docket element D-FLAG has a 
docket pointer 330 pointing at the corresponding IC- 
anchor 310, whereas the IC-anchor 310 has an anchor 

10 pointer 320 pointing at the corresponding docket 

element. Using the ring structure 800 the data storage 
system can quickly identify any result flag related, to 
a specific IC-anchor. 

To summarize, each Inf oCourse has one IC-anchor 

15 that relates to a corresponding docket element. That 

is, an InfoCourse (record) 300 is represented by an IC- 
anchor 310 and the corresponding docket element D-FLAG. 
The docket element is the docking point for the result 
flags C-FLAG1, C-FIAG2. A result .flag semantically 

2 0 plays the role of a dynamic flag. If a result flag is 

connected to a docket element, the InfoCourse, which is 
represented by the docket element, has been selected. 
That is, it fulfils one ore more conditions of the 
original query ♦ 

25 Multiple result flags that relate to different IC- 

anchors may be linked together in a pointer list by . 
means of pointers (e.g., pup and pDown) . This is also 
valid for the docket elements, since technically 
speaking they are also instances of the result flag 

30 class. A linear list of result flags is called a result 
set. Each result set is identified by a result ID. a 
result set flags a subset of InfoCourses that comply 
with at least portion of the Boolean expression 500 in 
the query. 
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assume that there are 20.000 result flags in the result 
set- If InfoCourse 15,390 is requested as an offset and 
no interval pointers are available, then it is 
advantageous to start at the bottom of the result set 
5 (result flag 20,000) and run through 20.000 - 15.390 + 
1 = 4.611 result flags instead of starting at the top 
and run through 15.390 result flags. The same is true 
when using interval pointers . 

In the previously explained implementation, the 

10 Boolean AND operator was applied to two result sets, 
in another implementation a "Lean AND" can be 

- implemented in case only one W a 

result of one portion of the Boolean expression and 
this result set is to be combined with a single 

15 condition through a Boolean AND operator. The Query may 
have a syntax like: (<complex Subquery>) AND condition 
CI* Also for multiple non-nested conditions combined 
with AND at the same bracket level the "Lean AND" can 
be used. A syntax example for this kind of flat Boolean 

20 expression is: CI AND C2 AND ~ AND Cn, where Cn are 
conditions. 

In these examples, only one result set exists and 
one or more conditions are to be combined with the 
Boolean AND operator. 
25 Assume a Boolean expression, such as: 

CI AND C2 AND - AND Cn. 

As explained before, the data storage system 902 
is able to quickly find out which of the conditions CI 
to Cn has the lowest number of hits, that is, the 
30 highest selectivity. The total number of hits in the 
intersection set of all conditions cannot be larger 
than the number of hits for the condition with the 
highest selectivity. 

Therefore, the data storage system creates a 
35 result set for the condition with the highest 
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application. As an example, assume that 20.000 
Inf ©Courses are found. That is, the final result set 
contains 20.000 result flags. If an application 
requests the next 20 InfoCourses after the 5.390th 
5 InfoCourse from the data storage system, then the 

request can be satisfied by using the final result set. 
The result flag 5.390 (offset) is located by running 
down the final result set and counting the result flags 
until the offset result flag at position 5.390 is 
10 reached. The next 20 InfoCourses are read from the 

corresponding tree structures (e.g., by using the IC- 
- anchors that relate to-- the corresponding docket 
elements) . The retrieved values may be serialized, for 
example, into a Send-Buf fer-Structure or any other kind 
15 of appropriate communication data structure. Any type 
of transport format and/or rearrangement, 
concatenation, etc. of data may be used for the Send- 
Buf fer- Structure (e.g., the use of fixed lengths) . 
Preferably, the application knows the data format 
2 0 provided by the data storage system to ensure stable 
communication. 

For a fast localisation of a specific InfoCourse 
(e.g., number 5.390) it is useful to subdivide a result 
set into Intervals. One can use an interval pointer 
which points to the result flag in the middle of the 
result set (e.g., result flag 10.000 of 20.000) or to 
any other sub- interval of the result set, such as 
quarters . According to the offset requested by the 
application the data storage system can jump to the 
nearest interval pointer and then sequentially run 
through only a part of the result set (e.g., upwards or 
downwards) and count until the requested result flag 
(e.g., docket element D-FLAG) has been reached. It is 
useful to cboose the direction having the shortest 
distance to the requested offset position. For example, 
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If the Boolean expression has only one single 
condition, result sets ere not necessary. Result sets 
fceeome valuable in case of a combination of 
conditions in a corresponding query. In t» . part cular 
, case of only one condition the count result for the 
total number of hits is obtained by means of the tree 
structures as described in FIGs. * to S. Xf Xnfocourses 
have to be returned to an application, this can also be 
done by using the tree structures. Instead of a result 
set identifying the etching Xnfocourses the tree nodes 
as such are used. For example, instead of running 
through a- resuit-sef-to vieit-all-matching-inf «»M 
and collect the data into a Send-Buf fer- structure, only 
the matching sub-tree structures identified by the 
corresponding start pointers are traversed. As scon as 
the offset infocourse is found, only the number of 
infocourses has to be visited that is to be returned 
For example, wben 10 xnfocourses have to be returned to 
an application, only 10 nodes from the offset 
infocourse on have to be traversed in the corresponding 
infoType sub-tree. Prom each node in an InfoType tree 
the corresponding anchor object can be reached, and 
from the anchor each attribute value of the given 
infocourse can be reached. 



15 



25 



30 



FIGS 13 to 15 illustrate a second implementation for 
the result flags C-FIAGl, C-FIAG2 and the result sets. 

This second implementation is appropriate for very 
large result sets, where the first implementation would 
require many result flag instances eating up a lot of 
memory space of the data storage system. 

FIG 13 illustrates three bitmaps BM-n, BM-n + l, BM-n + 2 . 
in the example, the start of bitmap BM-n coincides with 
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selectivity, then runs through all result flags of the 
result set and checks for each result flag if the 
remaining conditions are fulfilled by. the corresponding 
InfoCourse or not. In this implementation the 
5 InfoCourses are needed to check , for example, a 

condition NAME_FIRST « * Peter' . The data storage system 
. uses the relation from the result flag through the 
docking element to the corresponding IOanchor, which 
points at the corresponding InfoCourse. The 

10 corresponding InfoCourse tree is then searched for the 
InfoType values according to the remaining conditions. 

-In. this implementation, a second result set is not 

needed to be checked against the result set because the 
checking is directly performed on the related 

15 InfoCourses. As a consequence, the time to instantiate 
all the result flags of a second result set is saved by 
directly searching the InfoCourses matching the result 
set (already the most selective Condition) and checking 
directly if the corresponding values match or not. 

20 For each result flag this check is performed for 

one or more conditions. For example, in a query Cl AND 
C2 AND C3 AND C4, a result set is instantiated for the 
most selective condition and for each result flag the 
three other conditions are checked accordingly. If at 

25 least one condition does not match, the corresponding 
result flag i© deleted from the result set and the 
result counter is adjusted accordingly - 

Finally, the result set flags all matching 
InfoCourses (records) and the result counter has the 

30 correct number of hits, which may be reported to an 
application. 

For example, the "Lean AND" implementation can be 
advantageous when the time to instantiate the result 
flag instances of the second result flag exceeds the 
35 time to check the corresponding InfoCourses. 
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example shows that a bitmap can be made up of tens of 
thousands or even more machine words. The number of 
machine words in a Bitmap is only physically limited by 
: the size of the available main memory and the 
5 addressability of the main memory. 

That is, each Bitmap may consist of a 
theoretically unlimited number of machine words, where 
the length of a machine word depends on the given 
hardware platform and/or the operating system of the 
10 data storage system. Each bitmap is referenced by a 
result ID. Preferably, the result IDs 351, 352 are 

stored 'in a" tree- structure' allowing direct access to 

the start address of the corresponding bitmap via a 
pointer. In general, any structure (e.g., a linear 
15 list) can be used to administrate the result IDs. 

However, for large numbers of result IDs the access to 
a specific result set is more efficient when using a 
tree structure than when using a linear list or another 
structure. 

2 0 When the start address of a specific bitmap has 

been found, this specific bitmap can be used to count 
the number of hits {number of result flags) or to 
return data to an application. 

In the second implementation, each bitmap has a 

25 counter counting the Number of result flags, that is, 
the number of bits set to 1. To count the number of 
hits, the data storage system can run through all 
machine words of the bitmap. If a machine word has a 
value of zero, then all bits of the machine word are 

30 zero and the next machine word can be checked. For 
machine words having a value different from zero the 
data storage system determines the number of bits that 
are set to 1- This can be achieved by known methods, 
such as, shifting the bits -of a machine word into one 

35 direction, testing with bit masks performing a bit by 
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the basis address of bitmaps in the memory of the data 
storage system. 

For example, a first bitmap BM-n corresponds to 
the first result set 361 and a second bitmap BM-n+2 
5 corresponds to the second result set 362. The result 
flags C-FIAGl, C-FLAG2 are implemented as bits in the 
respective bitmaps. 

A bitmap in general consists of multiple machine 
words . Depending on the hardware architecture of the. 

10 data storage system, a machine word can consist, for 
example, of 32 or 64 bits. The second implementation 
also works with any other machine word- length, such as 
128 bit or more. A bitmap is a contiguous concatenation 
of machine words in a sufficiently large area of the 

15 data storage system memory. The number of bits in a 
(result Bet) bitmap corresponds to the number of IC- 
anchors of the Inf ©Courses selected by the Boolean 
expression 500. Therefore, each bitmap has the maximum 
size of a result set. Multiple bitmaps can 

20 simultaneously exist in the memory. Each bitmap (result 
set) is identified by a corresponding result ID. Each 
result ID points to the start address of its 
corresponding bitmap. For example, the first result ID 
351 and the second result ID 352 point to the start 

25 address of the first bit map BM-n and the second bit 
map BM-n+2 , respectively. 

Assume, 5 million records (Inf oCourses) are loaded 
into the tree structures of the data storage system. 
Therefore, ,5 million IC-anchors exist and, in this 

3 0 example, one bitmap includes 5 million bits (one Bit . 
per IC-anchor) . The bitmap occupies 5.000.000 / 8 = 
625.000 bytes = 610 KB. The 5 million bits correspond . 
to 5.000.000 / 64 = 78.125 machine words on a 64 Bit 
hardware platform and to 5.000.000 / 32 = 156.250 

35 machine words on a 32 Bit hardware platform. This 
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the specific bit is calculated as BA = (MWA ») * 64 
+ k for 64 Bit long machine words and BA = (MWA - SA) 
32 ♦ K for 32 Bit long machine words. At this memory 
location,' the specific bit of the Bitmap can be 
S checked. If it is set to 1, the InfoCourse with the 
corresponding IC-anchor is part of the result set. 

The IC-anchor can be found in the IC-anchor memory 
area in the following way. All IC-anchors reside in the 
IC-anchor memory area with the basis address C. The 
id size AS of an IC-anchor is known. Therefore, the IC- 
anchor address AA of the specific IC-anchor can be 

- - calculated as AA =— G- + BA * -AS Therefore,,. a pointer^ 

that is set to the address AA points to the requested 
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IC-anchor. 

When creating a result set bitmap in compliance 
with a portion of the Boolean expression, the result 
flag bits that relate to ic- anchors of the 
corresponding infoCourses are set to i. This can also 
be achieved by using relative addresses. 

Each IC-anchor has a memory address AA. The basxs 
address of the IC-anchor memory area is C. By knowing 
the size AS of an -IC-anchor, the IC-anchor position 
number can be calculated as BA = (AA - C) / AS. That 
is, the result flag bit for the BAth IC-anchor is to be 
located in the bitmap memory area. The start address of 
a specific bitmap (result set identified by result ID) 
is SA. The machine word address MAW where the bit is 
located is calculated as MAW » SA + BA div 64 on a 64 
bit hardware platform and MAW = SA + BA div 32 on a 32 
bit hardware platform, where the div operator divides 
one integer number by another integer number and 
returns the integer part of the result. Within the 
identified machine word at MAW the Kth bit has to be 
set to l with K = BA mod 64 on a 64 bit hardware 
platform and K = BA mod 3 2 on a 32 bit hardware 
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bit AND operation, etc. Each time a bit is set to 1, 
the counter is increased by l. At the end of the 
procedure the counter value corresponds to the number 
of bits set to 1 and, therefore, the number of result 
5 flags in the corresponding result set. 

FIG. 14. illustrates how result flags relate to 
corresponding IC-anchors in the second implementation 
using memory mapping. The shown bits in the bitmap 
10 memory represent only a portion of the bitmap memory 
area. 

In contrast to the first implementation, bits of 
bitmap and the corresponding IC anchors are not linked 
by pointers. 

15 However, IC-anchors and their corresponding bits 

are related by a memory mapping rule using relative 
addresses. A memory manager of the data storage system 
can ensure that the IC-anchors and the bitmaps reside 
in contiguous memory areas. The data storage system can 
20 then locate any IC- anchor that relates to a specific 
bit in a bitmap. 

For retrieving data (Inf oCoUrses) in response to a 
query the data storage system identifies the 
corresponding IC-anchors. Using the IC- anchor and the 
25 corresponding InfoCourse tree a specific node in an 
Inf oType tree can be found and the value can be read 
from the node. The value can then be copied, for 
example, to a Send-Buf f er-Structure as described 
earlier. To locate the specific bit that corresponds to 
30 the identified IC-anchor the data storage system can 
use an algorithm that works with relative addresses. 

A specific bit is part of a machine word. Assume 
that this specific bit is bit number K. The machine 
word has a memory address MWA. The whole bitmap has a 
35 start address SA. For example, the relative address of 
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l in the resulting machine wort. The sum of the 
counting results for all machine words of the resulting 
bitmap corresponds to the total number of brts set to 1 
i» the resulting bitmap and may be reported to an 
5 application as the number of hits. 

The application of the Boolean NOT operator to a 
bitmap is performed as a bit by bit HOT operation 
applied to each machine word in the bitmap. Agarn. the 
result may overwrite the original bitmap or can be 
10 written to another bitmap if the original bitmap has to 
be kept for later use. 
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The -Lean AND", as described under FIG. 12, only 
needs one result set and can also be implemented when 
using bitmaps. For example, the Boolean expression 
includes five conditions that are combined by Boolean 
AND operators: CI AND C2 AND C3 AND C4 AND C5 . A result 
set bitmap is set up (as described in FlGs. 13, 14) for 
the condition with the highest selectivity of all 
conditions included in the Boolean expression. Then the 
data storage system runs through the bitmap from the 
first to the last bit. For each bit that is set to.l 
the data storage system jumps to the corresponding ; 
infocourse and checks if all other Conditions are 
fulfilled by the corresponding InfoCourse. This check 
is performed in the same way as described in the 
implementation using result flag instances (cf.FlG. 
12) If all conditions are true the bit keeps its value 
1, otherwise the bit is set to 0. When a bit is set to 
0, the corresponding result counter containing the 
number of bits that are set to 1 is reduced by 1- 
Therefore, the result counter contains always the 
current number of hits. 

Alternatively, instead of getting the number of 
hits from the initial bitmap and reducing the counter 
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platform, where the mod operator divides two integer 
numbers and returns only the remainder. Alternatively, 
K could also be calculated as: 

K = BA - {MAW - SA) * BS, where 2 M is the addressable 
5 number of bits in the used hardware platform/operating 
system. 

FIG. 15 illustrates, how AND/ OR/NOT operators can be 
applied to the first and second bitmap BM-n, BM-n+2 by 

10 sequentially combining the corresponding pairs of 

machine words. The machine words are illustrated by 
cycles with a number that corresponds to the position 
of the machine word within its bitmap. 

Machine word 1 of the first bitmap BM-n is 

15 combined with machine word 1 1 of the second bitmap 

BM-n+2- This is repeated for all pairs of machine words 
(2,2') # (3,3'), and so on, with respect to the first 
and second bitmaps. Since all IC-anchors are 
represented by a corresponding bit in each of the 

2 0 bitmaps, all bitmaps have the same size and, thus, 
contain the same number of machine words. 

The logical combination of pairs of machine words 
by applying the Boolean AND or OR operators can be 
performed as a bit by bit AND or OR operation. Usually, 

25 the CPU (processor) can perform this in one processing 
cycle. Programming languages, such as C++, offer 
commands for this kind of bit by bit operations. 

The result of the combination of first and second 
result set bitmaps may be written to a new, third 

30 bitmap (e.g., BM-n+1) or to one of the two original 
bitmaps BM-n, BM-n+2. This depends on whether the 
original bitmaps may be overwritten or are to be kept 
for later use. 

After the processing of each pair of machine words 

35 the result flag counter counts how many bits are set to 



Empfansszei t 8-Juli 10:52 



10 



15 



JUL-2003 11 = 00 SRP RG UIALLDORF +49 622? 7S0251 S. 

2003P00396 BP J '■*■■• 

sets of the first implementation, farther, bit by bit 
operations are performed very fast and can usually be 
Ldled in one CPU processing cycle per machine word, 
if supported by a programming language, such as C++. 

For very large result sets (e.g.. several millions 
* of hits) , it can be more time saving to perform Boolean 
combinations of result sets using bitmaps instead of 
pointer lists of result flag instances. Already the 
instantiation of millions of result flag instances may 
last several seconds if parallelisation is not used. In 
addition the time for performing the Boolean 

combination has to -be considered.. The Boolean 

combination of result flag instances is performed one 
by one at the instance level. 

However, when a result set contains only a small 
number of hits then bitmaps may be "almost empty". That 
is, only a small number of bits is set to 1 in a large 
number of machine words (e.g., for 5 million 
infocourses a bitmap includes 78.125 machine words on a 
64 bit platform) . In a bad case only one bit is set to 
1 in each machine word. Therefore, for small result 
sets the use of result flag instances may be > 

advantageous . 

The computer program product components in FIG. 16; 
allow the data storage system to switch from one 
implementation to another implementation if the other 
implementation is more advantageous in a specific 
environment. For example, this can be achieved by 
transforming pointer lists of the first implementation 
"into bitmaps of the second implementation and vice 
versa. For these transformations the above explained 
procedures for creating result flag instances and for 
creating bitmaps can be used. 

Advantageously, the data storage system decides by 
35 itself when it is appropriate to use the first 



20 



30 



Em-Pf ansszei t.., 8 • J u I i 1.0:.52 



BB-JUL-JW03 hi OQ SHH HU UHLLUORh 



2003P00396 EP 29 +49 6227 760251 S.37/82 

each time a bit is set to 0 when an InfoCourse does not 
match the other conditions, the counter may also count 
the number of bits that are set to 1 in the bitmap 
after the "Lean AND" has been applied. 

5 

FIG. 16 is a simplified block diagram of software 
components of the computer system 990 that can be used 
with an embodiment of the invention to dynamically 
select a data retriever implementation in dependence of 

10 a specific environment. 

A result set may potentially contain millions of 

result-flags given a sufficiently -large number of 

InfoCourses loaded. In the first implementation, for 
example, on a 64 bit architecture one pointer address 

15 occupies already 64 Bit (.8 Bytes) . Each result flag has 
at least two pointers plus the content of the result 
flag. Therefore, one result flag may occupy several 
hundreds of bytes. In this case, one result set 
containing some millions of result flags occupies 

20 memory space in the range of up to several hundreds of 
megabytes. This is in addition to the memory space 
occupied by the tree structures (cf . FIGs. 2 to 8) that 
also reside in main memory. Further, as the data 
storage system in the first implementation processes 

25 the result sets sequentially and checks result flag 
instance by result flag instance to perform AND/OR 
combinations, this may lead to processing times of 
several seconds for one combination when applied to 
very large result sets (e.g., several millions of 

3 0 result flag instances) . One can apply an appropriate 

parallelisation to the first implementation or use the: 
second implementation to overcome these issues. 

The second implementation uses bitmaps for result 
sets. Bitmaps are a representation of result sets that 
35 consumes considerably less memory space than the result 
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value has a predefined value. FIG. 17 explain* details 
about how to initialise the threshold value. 

in ease the nu^er of hits (result flags) rs below 
the threshold value, the data, storage system uses the 
fi rst data retriever 111. The first data retr«ver xs 
configured according to the first implementation as 

described in FIG. 12. „*__. 

in case the number of hits (result flags) » above 
the threshold value, the data storage system uses the 
second data retriever 112. The second data retrrever rs 
configured according to the second implementation as 

...described in FIGs. 13 to 15. _ . 

The retrieval time measuring component 104 of the 
data storage system can measure the time that is 
consumed by either implementation for the. various Sizes 

of result sets. ... 

The threshold calculator 105 can dynamxcally 
determine (re -calculate) the threshold value on the 
basis of the time measurements with respect to the 
first and second implementations. The recalculated 
threshold value can be fed into the threshold evaluator 
103 and used for the next query. For example, the^ 
threshold calculator 105 can superpose the measurement 
results for the first data retriever ill with the.. 
i measurement results of the second data retriever 112 
and determine the number of hits, where both components 
consume the same time (break even point) , by using an 
appropriate interpolation algorithm. More details about 
the threshold calculator are explained in FIG. 18. 
0 in general, there can be multiple threshold values 

: that correspond to more than two data retrievers . For 
example, if a further data retriever performs more 
advantageous than the second data retriever 112 for 
very large result sets (e.g., more than 10 million 
,5 hits) , then the threshold calculator can determine a 



6 , 

i 



EmPfansszeit 8 . Ju I i 10:52 



08-JUL-2003 11-03 b'HP MB UHLLUUKF 

2003P00396 EP 



+49 6227 7S0251 S. 39/82 



implementation that is useful for small result sets 
having a number of result flags below a threshold value 
(e.g., up to several thousand elements) or the second 
implementation having a number of result flags above 
5 the threshold value. This enables the data storage 

system to automatically select the implementation which 
is best in terms of memory consumption and performance 
in a specific situation. It is not important, which 
implementation is used when the number of result flags 
10 equals the threshold value. 

The threshold value may be a break even point 
being defined as the number of result flags in a; result 
set, where the use of result flag instances leads to 
the same system performance as the use of bitmaps. The 
15 threshold value is not necessarily a predefined 

parameter of the data storage system but the data 
storage system can determine the threshold value 
dynamically, for example, by appropriate time 
measurements. Therefore, on a given technology platform 
20 for a given data volume, data value distribution, etc., 
the appropriate value for the threshold value can be 
used at all times. 

The query generator 101 generates the query that 
includes the Boolean expression 500. For example, the 
25 query generator can be implemented on the front end 

computing device 901. The query generator 101 can also 
be part of an application that runs on any other 
computing device of the computer system 990- 

Once the data storage system 902 receives the 
30 Boolean expression through a corresponding interface, 
the result counter 102 determines the corresponding 
number of hits. Preferably, the result counter is 
implemented in the data storage system 902. 

The threshold evaluator 103 compares the number of 
35 hits with the threshold value. Initially, the threshold 
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the way the software is influenced by the parameter 
settings- Furthermore, the specialist has to know the 
value ranges of each parameter. In practice, it turns 
out that the more profile parameters are available, the 
5 less likely it is that the specialist will succeed in 
tuning the software for optimal performance. 

For this reason, an embodiment of the invention 
can be used to reduce the number of profile parameters 
to a necessary minimum. Specialists working from 

10 outside the software should tune only parameters that 
depend on a specific use case or business scenario, 
jyhe,r.e_p£e- tu»ing..^-t*#.«».Pfib1*we is, dif ficult... For. 
example, consider the pre-tuning of relational database 
management systems, and in particular deciding in 

15 advance which indexes to create, which depends on the 
final structure of the various tables in the database 
system. 

In this embodiment of the invention, the software 
itself can tune scenario- independent profile 

20 parameters, such as the threshold values for the 
various implementations (cf . FIGS. 12 to 17). The 
initial values may be set during start-up of the 
software by running predefined test cases for the 
various implementations . 

25 The example of FIG. 17 illustrates the automatic 

determination of a threshold value with regards to two 
algorithms Al (illustrated by bullet points) and A2 
(illustrated by circles). For example, the first 
algorithm Al may be used by the first implementation 

3 0 and the second algorithm A2 may be used by the second 
implementation. In the example, a parameter p is tuned 
to a series of discrete values between a pair of chosen 
extreme values. The spacing of values between the 
extremes need not be very fine and can be equidistant. 

3 5 For each parameter value p (x) , a measurement is made of 
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further threshold value . that corresponds to the break 
even point between the second data retriever and the 
further data retriever. In this case, the threshold 
evaluator 103 compares the count result of the result 
5 counter 102 to two threshold values and selects the 
appropriate data retriever out of at least three data 
retrievers . 

Within a threshold value interval, there may be 
various data retrievers available, each of these data 
10 retrievers performing best in a given situation. For 

example, in the case of result sets below the threshold 
. value, the first data retriever 111 (first 
implementation applied to two result sets) can be used 
when the Boolean expression has a certain degree of 
15 complexity. However, alternatively, a "Lean AND" data 
retriever can be used, when the Boolean expression has 
a simpler structure and requires only a single result 
set as explained under FIGs. 12 and 15. 

That is, there can be further dependencies that 
20 are considered by the threshold evaluator when 
selecting the appropriate data retriever. 

FIG. 17 illustrates initialising threshold values. 
Regarding the initial definition of threshold values, 

25 one alternative is to provide a profile parameter for 
each threshold value in the profile file for the 
software. Profile files are commonly used, for example, 
for defining buffer sizes, time out parameters, 
hardware configuration parameters, or parameters for 

3 0 determining software behaviour in specific situations, 
such as error handling. For example, some parameters 
may influence the . performance of a software 
application. Usually, the setting of profile parameters 
requires a specialist who is familiar with the 

35 architecture of the software and has a good feeling for 
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If the difference is negative, D(p{c)) < 0, then 
the threshold value lies in the left half -interval 

[p(a) , p(c)] . 

If the difference is zero, D(p(c)) = 0, then the 
5 threshold value is determined exactly by the parameter 
value p(c) and the iteration is complete. 

If the difference is greater than a predefined 
delta, the iteration continues. The half -interval 
containing the threshold value is subdivided into two 
10 smaller half -intervals (which are either tp(c), p(d)J 
and [p(d), p(b)3 or [p(a), p(d)] and (p(d), p(c)], 
depending on whether D <p (c) ) is positive or negative) 
and the performance difference D(p(d)) is evaluated for 
the parameter value p(d), and so on, as above. 
15 The procedure stops as soon as the threshold value 

has been identified with sufficient precision. This may 
depend on: 

The size of the measured difference D(p), for : 
example, whether D(p) < delta, for some predefined 
20 minimal difference delta. 

The type of the parameter, for example, whether. 
p(x) is of type integer or floating point. 

• : In this way, this embodiment of the invention can 
calculate initial values for all threshold values 
25 during start-up . 

This start-up calculation can last for several 
milliseconds or even seconds before the software is up 
and running. 

However, the software tunes itself automatically 
30 arid optimally on the given environment (e.g. , given 

hardware, operating system) . It can be expected to do 
so more quickly, more exactly, and more inexpensively 
than a specialist could tune the software by manually 
setting profile parameters. 

35 
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the performance difference D(p(x)) {defined as runtime 
difference or any other suitable measure) between the 
two algorithms- For example, the difference may be 
defined as D (p (x) ) - PA1 (p (x) ) - PA2 (p (x) ) . For each 
5 difference D(p(x)), either a single measurement is made 
or the average of several runs can be taken. 

Below the threshold value, the performance of 
algorithm Al decreases steadily with increasing 
parameter value p(x) whereas the performance of 

10 algorithm A2 increases- So the magnitude of the 

performance difference D (p (x) ) between the algorithms 
decreases but always has the same sign- For example, if 
D(p(x)) = PAl(p(x)) - PA2(p(x)) / then the difference is 
positive as long as algorithm Al has a better 

15 . performance than algorithm A2 . 

The measured performance difference D(p(a)) for 
parameter value p{a) is the last positive difference, 
so p(a) is the greatest value of p such that D(p) > 0. 
From parameter value p(b) onward, the difference is 

20 negative, so p(b) is the least value of p such that 
D(p) < 0. The threshold lies in the interval between 
parameter values p (a) and p(b) , 

An iteration, for example based on an interval 
bisection procedure, can be used to locate the 

25 threshold parameter value within the interval 
'[ tp(a), p(b)] . 

For the first step of the iteration, the parameter 
value p(a) is the left interval border, p(b) is the 
right interval border and p(c) = [p(b) - p(a) ] / 2 is 

3 0 the middle of the interval. A new measurement D(p{c)) 
of the performance difference between the two 
algorithms is made for parameter value p (c) . 

If the difference is positive, D(p(c)) > 0, then 
the threshold value lies in the right half -interval 

35 [p(c) , p(b) ] . 
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If the current execution of algorithm Al 
corresponds to some value p(x) of parameter p that is 
below the threshold value Bl, the performance PAl(p(x)) 
of algorithm Al should in general be better than its 
5 performance at the corresponding threshold value Bl, 
since this is the reason why the software executes 
algorithm Al instead of algorithm A2 (see FIG. 17) . 

At the threshold value Bl, the performance of 
algorithm Al is by definition the same as that of 
10 algorithm A2, that is, PAl(Bl) = PA2 (Bl) . 

In FIG. 18, for the algorithm An (1 < n < N) 
chosen- Th" the "interval * between two neighbouring 
threshold values B (n - 1) and Bn, at the parameter 
value pc the performance PAn(pc) of the algorithm in 
15 that interval should be either the same as or better 
than its performance PAn(B(n - 1) ) and PAn(Bn) at the 
upper and lower threshold values B(n) and B{n - 1) t 

If the performance PAn(pc) of algorithm An (middle 
arrow) is below its performance PAn(B(n - 1)) (left 
20 arrow) or PAn(Bn) (right arrow) at a neighbouring 

threshold value (either B(n - 1) or Bn, respectively), 
then it is no longer reasonable to choose algorithm An 
at parameter value pc. 

Therefore, in future, at parameter value pc, the 
25 software chooses the algorithm that was earlier 

measured as performing better in the neighbouring 
parameter interval, which is either algorithm A(n - 1) 
or A(n + l) . This choice is equivalent to moving the 
threshold value B(n - 1) or Bn., respectively, to a.new- 
30 position at parameter value pc . 

However, when algorithm A(n - 1) or A(n + 1) is 
next run at parameter value pc, it may be the case that 
the newly measured performance of the algorithm is also 
reduced, possibly even more so than the performance of 
35 the original algorithm An. In this case, the threshold 
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FIG. .18 illustrates adjusting threshold values 
dynamically during operation by using the threshold 
calculator 105. There can be multiple algorithms Al to 
AN, where each algorithm is used for a corresponding 
5 parameter value range of the parameter p. That is, each 
algorithm has at least one threshold value that 
represents a boundary of the corresponding parameter 
value range. 

Assuming that the initial threshold values are 
10 correct at start-up and for some time after start-up, 
in the course of time, this situation may change, for 
example, because of memory fragmentation or 
accumulating memory leaks due to bugs in the coding or 
other reasons. Therefore, after a certain time the 
15 software can run under conditions that differ from 
those that prevailed immediately after start-up. 

The performance of any of the algorithms Al to AN 
may degrade or improve by different amounts relative to 
the other algorithms. Therefore, the corresponding 
20 threshold values may shift in the course of time. 

This embodiment of the invention can automatically 
and regularly repeat its determination of its threshold 
values, as specified in the above calculation (cf. FIG. 
17) , so as to adjust the threshold values used to 
25 switch algorithms dynamically during runtime. 

To revise its determination of threshold values, 
the software makes automatic performance measurements, 
for example, by using the time retrieval measuring 
component 104 (cf. FIG. 16). For example, the 
3 0 measurements can simply be records of the time taken 

for certain tasks to run. The measurements can be made 
either on an ongoing basis or fr-om time to time. 

When using ongoing performance measurements, the 
software measures the performance of each execution of 
3 5 an algorithm. 
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However, if an expert explicitly sets a parameter value 
by entering it in the profile file, then the software 
does not change the value of this parameter. 

5 Embodiments of the invention can be implemented in 
digital electronic circuitry, or in computer hardware, 
firmware, software, or in combinations of them. The 
invention can be implemented as a computer program 
product, i.e., a computer program tangibly embodied in 

10 an information carrier, e.g., in a machine -readable 
storage device or in a propagated signal, for execution 

* ~ by / or " to' "coritror the^ operation of 7" data processing 
apparatus , e . g . , a programmable processor , a computer, 
or multiple computers. A computer program can be 

IS written in any form of programming language, including 
compiled or interpreted languages, and it can be 
deployed in any form, including as a stand-alone 
program or as a module, component, subroutine, or other 
unit suitable for use in a computing environment. A 

20 computer program can be deployed to be executed on one 
computer or on multiple computers at one site or ^ 
distributed across multiple sites and interconnected by f \> 
a communication network. 

Method steps of the invention can be performed by 

25 one or more programmable processors executing a 
computer program to perform functions of the invention 
by operating on input data and generating output. 
Method steps can also be performed by, and apparatus of 
the invention can be implemented as, special purpose 

30 logic circuitry, e.g., an FPGA (field programmable gate 
array) or an ASIC (application-specific integrated 
circuit) . 

Processors suitable for the execution of a 
computer program include, by way of example, both 
3 5 general and special purpose microprocessors, and any 
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value B(n - 1) or Bn, respectively, should not have 

been moved to the new position pc. 

This situation may occur in practice because any 

reasons for the reduced performance of algorithm An may 
5 also apply to reduce the performance of algorithm A(n - 

1) or A(n + 1) . 

Therefore, a situation like this can trigger a 

revised determination of all the threshold values, 

either immediately or as soon as practically possible, 
10 for example, when the system load is sufficiently low. 

Alternatively, the software can generate a system 

message to warn an administrator that the latest 

performance measurements indicate the need for a . 

revised determination of the threshold values . 
15 When using performance measurements, from time to 

time, the software recalculates the threshold values 

preferably at times of low system load, in the same way 

that it does during start-up . This recalculation may 

be defined as part of a bundle of housekeeping tasks 
20 that are performed at regular intervals by the 

software. In this case, the threshold values are; 

adjusted with a lower frequency than when using the 

ongoing basis alternative. 

Removing as many profile parameters as possible 
2 5 from a profile file and letting the software itself 

tune such parameters instead of a specialist can lead 

to an improved system performance over the full range 

of parameter values. 

However, in certain exceptional and rare 
30 situations, there may be good reasons why such 

parameters should not be tuned by the software but from 

outside by a specialist. 

These exceptional cases can be handled as follows. 

By default , profile parameters that are tuned by the 
35 software itself do not appear in the profile file. 
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e g ( an application server, or that includes a 
front-end component, e.g., * client computer having a 
graphical user interface or a Web browser through which 
a user can interact with an implementation of the 
invention. or any combination of such back-end 
middleware, or front-end components. The components of 
the system can be interconnected by any form or medium 
of digital data communication, e.g., a communication 
network. Examples of communication networks include a 
local area network (LAN), and a wide area network (WAN) , 
e.g., the Internet . . 

- - The- computing^ system .can. include, .clients, and 

servers. A client and server are generally remote from 
each other and typically interact through a 
communication network. The relationship of client and 
server arises by virtue of computer programs running on 
the respective computers and having a client-server 
relationship to each other. 
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one or more processors of any kind of digital computer. 
Generally, a processor will receive instructions and 
data from a read-only memory or a random access memory 
or both. The essential elements of a computer are at 
5 least one processor for executing instructions and one 
or more memory devices for storing instructions and 
data. Generally, a computer will also include, or be 
operatively coupled to receive data from or transfer 
data to, or both, one or more mass storage devices for 
storing data, e.g., magnetic, magneto-optical disks, or 
optical disks. Information carriers suitable for 
embodying computer - program instructions and data 
include all forms of non-volatile memory, including by 
way of example semiconductor memory devices, e.g., 
15 EPROM, EEPROM, and flash memory devices; magnetic 
disks, e.g., internal hard disks or removable disks; 
magneto-optical disks; and CD-ROM and DVD-ROM disks. 
The processor and the memory can be supplemented by, or 
incorporated in special purpose logic circuitry. 
20 To provide for interaction with a user, the 

invention can be implemented on a computer having a 
display device, e.g., a cathode ray tube (CRT) or 
liquid crystal display (LCD) monitor, for displaying 
information to the user and a keyboard and a pointing 
25 device, e.g., a mouse or a trackball, by which the user 
can provide input to the computer. Other kinds of 
devices can be used to provide for interaction with a 
user as well; for example, feedback provided to the 
user can be any form of sensory feedback, e.g., visual 
30 feedback, auditory feedback, or tactile feedback; and 
input from the user can be received in any form, 
including acoustic, speech, or tactile input. 

The invention can be implemented in a computing 
system that includes a back-end component, e.g., as a 
data server, or that includes a middleware component. 



35 
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2. Method according to claim 1, wherein the merging 
step is performed by applying Boolean operators to 
the result flags of the corresponding infoCourse 
anchors . 

5 

3. Method according to claim l or 2, wherein the 
merging step starts with the result set (352) 
having the lowest number of result flags. 

10 4. Method according to anyone of the claims 1 to 3> 
wherein the first and second portions (501, 502) 
. - ... , comprise- conditions that relate to InfoTypes. -- 

5 . Method according to anyone of the claims 1 to 4 , 
15 wherein each result flag (C-FLAGl, C-F1AG2) is 

implemented as and instance of a result flag class, 
where the instance is related to its corresponding 
InfoCourse anchor through a docket pointer (3 3 0) . 

20 6. Method of claim 5, wherein the result flags of the 
first and second result sets (361, 362) form .a 
first pointer list (PL-1) and a second pointer list 
(PL-2) , respectively, and each pointer list is 
linked to the corresponding result identification 

25 number (351, 352) . 

7. Method according to anyone of the claims 1 to 4, 
wherein each result flag (C-FLAG1, C-FIAG2 ) is 
implemented as a bit being related to its 
30 corresponding InfoCourse anchor through memory 

mapping . 
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Claims 



1. A computer implemented method for retrieving data 
from a data storage system (902) , where the data 
storage system (902) stores the data using a data 
structure with at least one instance of a first 
data type (InfoCluster) associated with: 

- instances of a second data type (InfoTypes) 
representing data attributes, and 

- instances of a third data type (InfoCourses) , 
representing data records, associated with said 
InfoTypes, wherein the InfoTypes are arranged in a 
first tree structure, and wherein the elements of 
an InfoCourse are arranged in a second tree 

15 structure, each InfoCourse (300, 301, 302, 303) 

having an InfoCourse anchor (310, 311, 312, 313), 
the method comprising the steps of: 

receiving a query comprising a Boolean expression 
(500) that includes a first portion (501) and 
20 at least a second portion (502) ; 

determining a first result set (361) comprising 
result flags (C-PLAGl) in compliance with the 
first portion (501) and at least a second 
result set (362) comprising further result 
25 fla 9 s (C-PIAG2) in compliance with the second 

portion (502), each result flag being related 
to a corresponding result identification number 
(351, 352) and being related to the 
corresponding InfoCourse anchor (320, 311, 312 
30 313) ; and 

merging the first and the at least second result 
sets (361, 362) into a single result set (363). 
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14. Method according to anyone of the claims 11 to 13, 
Wherein first implementation is implemented 
according to claim 5 or 6. 

15. Method according to anyone of the claims 11 to 13, 
wherein second implementation is implemented 
according to anyone of the claims 7 to 10. 
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8. Method of claim 7, wherein the result flags of the 
first and second result sets (361, 362) form a 
first bit map (BM-n) and a second bit map (BM-n+2) , 
respectively. 

5 

B. Method of claim 8, wherein the first result 
identification number (351) and the second result 
identification number (352) point to the start 
address of the first bit map (BM-n) and the second 
10 bit map (BM-n+2) , respectively. 

10 ! Method according" to' "anyone of the claims 7 to 9, 
wherein the result identification numbers (351, 
352) are stored in a tree structure. 

15 

11. A computer implemented method for retrieving data 
from a data storage system (902) comprising: 
retrieving the data by using the method according 

to claim 1 in a first implementation if the 
20 number of result flags in a result set is below 

a threshold value; and 
retrieving the data by using the method according 

to claim 1 in a second implementation if the 

number of result flags in the result set is 
25 above the threshold value. 

12. Method according to claim 11 f wherein the threshold 
value is dynamically determined by time 
measurements with respect to the first and second 

30 implementations. 

13. Method according to claim 12, wherein the threshold 
value is the break even point with respect to the 
performance of the first and second 

35 implementations. 
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18. Method according to claim 17, comprising the 
further step: 

deleting each result flag that relates to an 

Inf oCourse Anchor of an Inf oCourse not 

5 fulfilling the condition. 

19. A computer program product for retrieving data from 
a data storage system (902) in response to a 
Boolean expression (500) comprising a plurality of 

10 instructions that when loaded into a memory of the 

data storage system (902) cause at least one 

- • processor- of ' " the" "data storage ' system (902) to 

execute the steps of any of the claims 1 to 10. 

15 20. A computer program product for retrieving data from 
a data storage system (902) in response to a 
Boolean expression (500) comprising a plurality ;oi 
instructions that when loaded into a memory of the 
data storage system (902) cause at least one 

20 processor of the data storage system (902) , to 

execute the steps of any of the claims 11 to 15 . , 

21. A computer program product for retrieving data from 
a data storage system (902) in response to a 
25 Boolean expression (500) comprising a plurality of 

instructions that when loaded into a memory: of the 
data storage system (902) cause at least one 
processor of the data storage system (902) to 
execute the steps of any of the claims 16 to .18. 

30 
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16. A computer implemented method for retrieving data 
from a data storage system (902) , where the data 
storage system (902) stores the data using a data 
structure with at least one instance of a first 
5 data type (InfoCluster) associated with: 

- instances of a second data type (InfoTypes) 
representing data attributes, and 

- instances of a third data type (Inf oCourses) , 
representing data records, associated with said 

10 InfoTypes, wherein the InfoTypes are arranged in a 

first tree structure, and wherein the elements of 
an InfoCourse are arranged in a second tree- 
structure, each InfoCourse (300, 301, 302, 303) 
having an InfoCourse anchor (310, 311, 312, 3 13), 
15 the method comprising the steps of: 

receiving a query comprising a Boolean expression 
(500) that includes a first portion (501) and 
at least a second portion (502) ,- 
determining a result set (362) comprising result 
20 flags (C-PIAG2) in compliance with the second 

portion (502), each result flag being related 
to the corresponding InfoCourse anchor (310, 
311) ; and 

checking for each InfoCourse (300, 301) relating to 
25 th « corresponding InfoCourse anchor (310, 311), 

whether the InfoCourse fulfils the first 
portion (501) of the Boolean expression. 

17. Method according to claim 16, wherein the first 
to portion (501) is a condition that is combined with 

the second portion (502) by the Boolean AND 
operator . 
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26. A data storage system (902) comprising: 

a memory to store data using a data structure with 
at least one instance of a first data type 
5 (Inf oCluster) associated with: 

- instances of a second data type (InfoTypes) 
representing data attributes, and 

- instances of a third data type (infoCoursee) , I 
representing data records, associated with 

10 said InfoTypes, wherein the InfoTypes are 

arranged in a first tree structure, and \ 
... — , wherein elements of an Inf oGourse are arranged - 

in a second tree structure, each InfoCourse j 
(300 , 301, 302, 303) having an InfoCourse 
15 anchor (310, 311, 312, 313); 

an interface to receive a query comprising a 

Boolean expression (500) that includes a first ^ 
portion (S01). and at least a second portion * 
(502) ; and f 
20 at least one processor to determine a first result t j 

set (361) comprising result flags (C-FIiAGl) in v 
compliance with the first portion (501) and at \ 
least a second result set (362) comprising 
further result flags (C-FLAG2) in compliance ! 
25 with the second portion (502) , each result flag 

being related to a corresponding result 
identification number (351, 352) and being , 
related to the corresponding InfoCourse anchor 
(310, 311, 312, 313); and to merge the first \ 
30 and the at least second result sets (361, 362) 

into a single result set (363) . 
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22. A computer program product for retrieving data from 
a data storage system (902) in response to a 
Boolean expression (500) comprising: 
a result counter (102) to determine a number of 
5 hits in response to the Boolean expression; 

a threshold evaluator (103) to compare the number 

of hits with a threshold value 
a first data retriever (111) to retrieve the data 
in case the number of hits is below the 
10 threshold value; and 

a second data retriever (112) to retrieve the data 
in case the . number of, hits is above the 
threshold value. 

15 23. The computer program product of claim 22, further 
comprising: 

a retrieval time measuring component (104) to 
measure the time that is consumed by the first 
and second data retrievers (111, 112) for 
20 various numbers of hits; and 

a threshold calculator (105) to dynamically 
determine the threshold value on the basis of 
the results of the retrieval time measuring 
component (104) with respect to the first and 
second data retrievers (111, 112) and to feed 
back the determined threshold value into the 
threshold evaluator (103) . 

24 . The computer program product according to claim 22 
30 or 23, where the first data retriever (111) i e 

implemented according to claim 19. 

25. The computer program product according to claim 22 
or 23, where the second data retriever (112) is 

35 implemented according to claim 20. 



25 



EraPfansszeit 8-Juli 10:52 



08-JUL-2003 11 = 04 SAP AG UIALLDORF + 4g 6227 760251 s 60/82 

2003P00396 EP 52 



32. Data storage system according to any one of the 
claims 30 to 31, where the memory stores the result 
identification numbers (351 , 352) in a tree 
structure* 

5 

33- Data storage system according to any one of the 
claims 30 to 32, where the processor performs the 
merging of the first and second bitmaps (BM-n, 
BM-n+2) in a bit by bit operation combining a 
10 machine word of the first bitmap (BM-n) with a 

corresponding machine word of the second bitmap 
. — (BM-n+2) 'in* one processing cycle: 

34, A computer system (990) comprising: 

15 a computing device (901) using a query generator 

(101) to generate a Boolean expression (500) 
having a first portion (501) and at least a 
second portion (502) ; and 
a data storage system (902) operable in response to 

20 the Boolean expression (500) according to any 

one of the claims 26 to 31. • 

35. A data storage system (902) comprising: 

a result counter (102) to determine a number of 
25 hits in response to a Boolean expression; 

a threshold evaluator (103) to compare the number 

of hits with a threshold value; 
a first data retriever (111) to retrieve the data 

in case the number of hits is below the 
30 threshold value; and 

a second data retriever (112) to retrieve the data 

in case the number of hits is above the 

threshold value. 
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27. Data storage system according to claim 26, where 
the first and second portions (501, S02) comprise 
conditions that relate to InfoTypes. 

5 28. Data storage system according to any one of the 
claims 26 to 27, where each result flag (C-FIAGl, 
C-PLAG2) is instantiated from a result flag class 
and the resulting result flag instance is stored in 
the memory and is related to its corresponding 
10 InfoCourse anchor through a docket pointer (330) . 

29. Data storage system according to claim 28, where 
the memory further stores a first pointer list 
(PL-1) and a second pointer list (PL-2) comprising 
the result flag instances of the first and second 
result sets (361, 362), respectively, and each 
pointer list is linked to the corresponding result 
identification number (351, 352) . 



15 



20 30. Data storage system according to any one of the 
claims 26 to 27, where the memory stores the first 
and second result sets (361, 362) in the form of a 
first bit map (BM-n) and a second bit map (BM-n+2) , 
respectively and each result flag (C-FLAG1, 

25 C-FIAG2) is represented by a bit being related to 

its corresponding InfoCourse anchor through memory 
mapping. 



30 



31. Data storage system according . to claim 30, where 
the first result identification number (351) and 
the second result identification number (352) point 
to the memory start address of the first bit map 
(BM-n) and the second bit map (BM-n+2) , 
respectively. 



35 
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39. A data storage system (902) comprising: 

a memory to store data using a data structure with 
at least one instance of a first data type 
(InfoCluster) associated with: 
5 _ instances of a second data type (InfoTypes) 

representing data attributes, and 
- instances of a third data type (InfoCourses) , 
representing data records, associated with 
said InfoTypes, wherein the InfoTypes are 
10 arranged in a first tree structure, and 

wherein elements of an InfoCourse are arranged 

- - in a second tree structure, each - InfoCourse 

(300, 301, 302, 303) having an InfoCourse 
anchor (310, 311, 312, 313), 
15 an interface to receive a query comprising a 

Boolean expression (500) that includes a first 
portion (501) and at least a second portion, 
(502) ; and 

at least one processor to determine a result set 
20 (362) comprising result flags (C-FLAG2) in 

compliance with the second portion (502), each 
result flag being related to the corresponding* 
InfoCourse anchor (310, 311); and to check for, 
each InfoCourse (300, 301) that relates to the 
25 corresponding InfoCourse anchor (310, 311), 

whether the InfoCourse fulfils the first 
portion (501) of the Boolean expression. 

40. Data storage system according to claim 39, where 
30 the first portion (501) is a condition that the 

processor combines with the second portion (502) 
using the Boolean AND operator. 
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36. The data storage system of claim 35, further 
comprising: 

a retrieval time measuring component (104) to 
measure the time that is consumed by the first 
5 and second data retrievers (111, 112) for 

various numbers of hits; and 
a threshold calculator (105) to dynamically 
determine the threshold value on the basis of 
the results of the retrieval time measuring 
10 component (104) with respect to the first and 

second data retrievers (111, 112) and to feed 
back the determined threshold value into the 
threshold evaluator (103) . 

15 37, The data storage system of claim 3 5 or 36, where 
the first data retriever (111) is implemented 
according to any one of the claims 28 to 23. 

38. The data storage system of claim. 3 5 or 36 , where 
2 0 the second data retriever (112) is implemented 

according to any one of the claims 30 to 33. 
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METHOD AMD COMPUTER SYSTEM FOR QUERY PROCESSING 

Abstract of the Invention 

5 Method and computer system for retrieving data from a 
data storage system. The data storage system receives a 
query comprising a Boolean expression that includes a 
first portion and at least a second portion. The data 
storage system determines a first result set (3 61) 

10 comprising result flags (C-FLAG1) in compliance with 
the first portion and at least a second result set 
..(362) comprising • further result flags (C-FIAG2 ) . in 
compliance with the second portion. Each result flag is 
related to a corresponding result identification number 

15 (351, 352) and is related to a corresponding InfoCourse 
anchor (310, 311, 312, 313). Then, the data storage 
system merges the first and the at least second result 
Bets (361, 362) into a single result set (363) . 

20 FIG. 10 
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1. Data storage system according to claim 40, where 
the processor deletes each result flag that relates 
to an InfoCourse Anchor of an InfoCourse not 
fulfilling the condition. 
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